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1 On increasin g architecture awareness in pro g ram optinnizations to 
brid g e the ga p between peak and sustained processor performance: 

matrix-multiply revisited 

David Parello, Olivier Temam, Jean-Marie Verdun 

November 2002 Supercomputing '02: Proceedings of the 2002 ACM/IEEE 

conference on Supercomputing 
Publisher: IEEE Computer Society Press 

Additional Information: full citation , abstract . 
Full text available: ■gj>df(263.32 KB), references , cited by . index 

terms 

As the connplexity of processor architectures increases, there is a 
widening gap between peak processor performance and sustained 
processor performance so that programs now tend to exploit only a 
fraction of available performance. While there is a tremendous ... 
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Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on . 
Lang uage , compilers , and tool su p port for embedded systems 
Mary Jane Irwin, Koen De Bosschere 
June 2006 proceeding 
Publisher: ACM 

Additional Information: full citation , abstract 

It is our great pleasure to welcome you to the ACM SIGPLAN/SIGBED 
Conference on Languages, Compilers, and Tools for Embedded Systems 
— LCTES 2006. This year's conference continues its tradition of being the 
premier forum for presentation of research ... 



Compiler-directed page coloring for multiprocessors 
Edouard Bugnlon, Jennifer M. Anderson, Todd C. Mowry, Mendel 
Rosenblum, Monica S. Lam 

October 1996 ASPLOS-VII: Proceedings of the seventh international 
conference on Architectural support for programming 
languages and operating systems 
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Publisher: ACM 

Full text available: ffl pdfd .37 MB) Additional Information: full citation , abstract, references. 
^^=^'^^ cited b y. index terms 

This paper presents a new technique, compiler-directed page coloring, 
that eliminates conflict misses in multiprocessor applications. It enables 
applications to make better use of the increased aggregate cache size 
available In a multiprocessor. ... 



A comparison of empirical and model-driven optimization 
Kamen Yotov, Xiaoming Li, Gang Ren, Michael Cibulskis, Gerald DeJong, 
Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghill, Peng Wu 
June 2003 PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on 

Programming language design and implementation 
Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: ^,pdf(448.74 KB). references , cited by . index 

terms 

Empirical program optimizers estimate the values of key optimization 
parameters by generating different program versions and running them 
on the actual hardware to determine which values give the best 
performance. In contrast, conventional compilers ... 

Keywords: BLAS, blocking, code generation, compilers, empirical 
optimization, memory hierarchy, model-driven optimization, program 
transformation, tiling, unrolling 



Compiler-directed p a ge coloring for multiprocessors 
Edouard Bugnion, Jennifer M. Anderson, Todd C. Mowry, Mendel 
Rosenblum, Monica S. Lam 

December 1996 ASPLOS-VII: ACM SIGOPS Operating Systems Review, 

Volume 30 Issue 5 

Publisher: ACM 

Full text available: mP-df(L37MB> Additional Information: full citation, abstract, references . 

cited by . index terms 

This paper presents a new technique, compiler-directed page coloring, 
that eliminates conflict misses in multiprocessor applications. It enables 
applications to make better use of the increased aggregate cache size 
available in a multiprocessor. ... 



6 Value reuse optimization: reuse of evaluated math library function 
^ calls through compiler g enerated cache 
^ K. V. Seshu Kumar 

August 2003 ACM SIGPLAN Notices, Volume 38 issue 8 

Publisher: ACM 

Full text available: « pdf(880.20 KB) Additional Information: full citation , abs^. 

references , cited by 

Value reuse technique eliminates the redundant evaluation of 
expressions, using the support of hardware at runtime to eliminate 
them. The potential performance of a value reuse mechanism not only 
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depends on the number of instances it has eliminated, ... 

Keywords: Compilers Optimization, Function Cache, Function reuse, 
Instruction reuse 



7 Compiler optimizations for nondeferred reference: countin g g arba g e 

^ collection 

^ Pramod G. Joisha 

June 2006 ISMM '06: Proceedings of the 5th international symposium on 
Memory management 

Publisher: ACM 

.- ^ •• ui dPi jf/nnn nn ifn\ Additional Information: full citation, abstract. 
Full text available: TO pdf(220.0Q KB) — — — ' 

^ references , index terms 

Reference counting is a well-known technique for automatic memory 
management, offering unique advantages over other forms of garbage 
collection. However, on account of the high costs associated with the 
maintenance of up-to-date tallies of references ... 

Keywords: reference counting, static analyses 



8 A Study of devirtualization techniques for a Java Just-In-Time 
^ compiler 

^ Kazuaki Ishizaki, Motohiro Kawahito, Toshiaki Yasue, Hideaki Komatsu, 
Toshio Nakatani 

October 2000 OOPSLA '00: Proceedings of the 15th ACM SIGPLAN 
conference on Object-oriented programming, systems, 
languages, and applications 

Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: ^ pdf(225.89 KB) references , cited by . index 

terms 

Many devirtualization techniques have been proposed to reduce the 
runtime overhead of dynamic method calls for various object-oriented 
languages, however, most of them are less effective or cannot be 
applied for Java in a straightforward manner. This ... 



9 Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on 
4^ Languag es, compilers, and tools 
^ Santosh Pande, Zhiyuan Li 
June 2007 proceeding 
Publisher: ACM 

Additional Information: full citation , abstract 

It is with great pleasure that we welcome you to the ACM 2007 
Conference on Languages Compilers and Tools for Embedded Systems 
(LCTES'07) on behalf of its organizational committees. The aim of LCTES 
is to provide a premier forum for sharing the ... 
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10 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on 
^ Languages , compilers, and tools for embedded systems 
^ Yunheung Paek, Rajiv Gupta 

June 2005 proceeding 

Publisher: ACM 

Additional Information: full citation , abstract 

It is our great pleasure to welcome you to the ACM SIGPLAN/SIGBED 
Conference on Languages,^ Compilers, and Tools for Embedded Systems 
— LCTES'05. This year's conference continues its tradition of being the 
premier forum for presentation of research ... 



11 Desi g n, implementation, and evaluation of optimizations in a i ust-in- 
time compiler 

Kazuaki Ishizaki, Motohiro Kawahito, Toshiaki Yasue, Mikio Takeuchi, 
Takeshi Ogasawara, Toshio Suganuma, Tamiya Onodera, Hideaki Komatsu, 
Toshio Nakatani 

June 1999 JAVA '99: Proceedings of the ACM 1999 conference on Java 

Grande 
Publisher: ACM 

I- II * ^ -I ui 0 AO KMo\ Additional Information: full citation , references , cited by . 
Full text available: TO pdf (1.09 MB ) rz ~ 
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12 A compiler framework for restructuring data declarations to enhance 
cache and TLB effectiveness 

David F. Bacon, Jyh-Herng Chow, Dz-ching R. Ju, Kalyan Muthukumar, 
Vivek Sarkar 

October 1994 GASCON '94: Proceedings of the 1994 conference of the 

Centre for Advanced Studies on Collaborative research 
Publisher: IBM Press 

Additional Information: full citation , abstract . 
Full text available: ^pdf( 298.15 KB) references , cited by . index 

terms 

It has been observed that memory access performance can be improved 
by restructuring data declarations, using simple transformations such as 
array dimension padding and inter-array padding (array alignment) to 
reduce the number of misses in the cache ... 



13 Loop fusion for memory space optimization 
Antoine Fraboulet, Karen Kodary, Anne Mignotte 

September 2001 ISSS '01: Proceedings of the 14th International symposium 
on Systems synthesis 

Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: ^.fjdf Q 52.91 KB ). references , cited by . index 

terms 

Portable or embedded systems as well as submicronic technologies have 
made the power consumption criterium crucial. I^emory is known to be 
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extremely power consuming. Moreover multimedia applications are 
memory intensive applications. Therefore, we propose ... 



14 A stud y of devirtualization techniques for a Java Just-ln-Tinrie 
^ com piler 

Kazuaki Ishizaki, Motohiro Kawahito, Toshiaki Yasue, Hideaki Komatsu, 
Toshio Nakatani 

October 2000 OOPSLA '00: ACM SIGPLAN Notices, volume 35 issue lo 
Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: pdf(225.89 KB) references , cited by . index 

terms 

Many devirtualization techniques have been proposed to reduce the 
runtime overhead of dynamic method calls for various object-oriented 
languages, however, most of them are less effective or cannot be 
applied for Java in a straightforward manner. This ... 



15 Stack allocation and synchronization optimizations for Java using 
^ escape analysis 

^ Jong-Deok Choi, Manish Gupta, Mauricio J. Serrano, Vugranam C. Sreedhar, 
Samuel P. Midkiff 

November 2003 ACM Transactions on Programming Languages and 
Systems (TOPLAS), volume 25 issue 6 

Publisher: ACM 

Additional Information: fuli citation , abstract . 
Full text available: ^ pdf(632.85 KB) references , cited by . index 

terms , review 

This article presents an escape analysis framework for Java to determine 
(1) if an object is not reachable after its method of creation returns, 
allowing the object to be allocated on the stack, and (2) if an object is 
reachable only from a single ... 

Keywords: Connection graphs, escape analysis, points-to graph 



16 Data size optimizations for java programs 
^ C. Scott Ananian, Martin Rinard 

^ July 2003 LCTES '03: ACM SIGPLAN Notices, volume 38 issue 7 
Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: Q pdf(349.36 KB) references , cited by . index 

terms 

We present a set of techniques for reducing the memory consumption of 
object-oriented programs. These techniques include analysis algorithms 
and optimizations that use the results of these analyses to eliminate 
fields with constant values, reduce the ... 

Keywords: bitwidth analysis, embedded systems, field externalization, 
field packing, size optimizations, static specialization 
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17 A comparison of empirical and model-driven optimization 

Kamen Yotov, Xiaoming Li, Gang Ren, Michael Cibulskis, Gerald DeJong, 
Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghlll, Peng Wu 
May 2003 PLDI '03: ACM SIGPLAN Notices, volume 38 issue 5 
Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: ^.Rdf(448.74 KB} references , cited bv . index 

terms 

Empirical program optimizers estimate the values of key optimization 
parameters by generating different program versions and running them 
on the actual hardware to determine which values give the best 
performance. In contrast, conventional compilers ... 

Keywords: BLAS, blocking, code generation, compilers, empirical 
• optimization, memory hierarchy, model-driven optimization, program 
transformation, tiling, unrolling 



18 Data size optimizations for java programs 
^ C. Scott Ananian, Martin Rinard 

^ June 2003 LCTES '03: Proceedings of the 2003 ACM SIGPLAN conference on 
Language, compiler, and tool for embedded systems 
Publisher: ACM 

Additional Information: full citation , abstract . 
Full text available: ^ pdf(349.36 KB) references , cited by . index 

terms 

We present a set of techniques for reducing the memory consumption of 
object-oriented programs. These techniques include analysis algorithms 
and optimizations that use the results of these analyses to eliminate 
fields with constant values, reduce the ... 

Keywords: bitwidth analysis, embedded systems, field externalization, 
field packing, size optimizations, static specialization 



19 Compiler optimization-space exploration 

Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, David I. 
August 

March 2003 CGO '03: Proceedings of the international symposium on Code 
generation and optimization: feedback-directed and runtime 
optimization 

Publislier: IEEE Confiputer Society 

.- ^ I ui 01 -ir/i irt ■lJlD^ Additional Information: full citation , abstract, references . 
Full text available: TO pdf(1.19 MB) "TTTT ^.i x 

^ cited by . index terms 

To meet the demands of modern architectures, optimizing compilers 
must incorporate an ever larger number of increasingly complex 
transformation algorithms. Since code transformations may often 
degrade performance or interfere with subsequent transformations, ... 



20 Compiler transformations for high-performance computing 
David F. Bacon, Susan L. Graham, Oliver J. Sharp 
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^ December 1994 ACM Computing Surveys (CSUR), volume 26 issue 4 

^ Publisher: ACM 

c II ♦ ^ ^ .^Ki^. ^ r.A4/a KAD\ Additional Information: full citation , abstract , references . 
Full text available: TO pdf(6.32 MB) . - 

cited by . index terms , review 

In the last tiiree decades a large number of compiler transformations for 
optimizing programs have been implemented. Most optimizations for 
uniprocessors reduce the number of instructions executed by the 
program using transformations based on the analysis ... 

Keywords: compilation, dependence analysis, locality, multiprocessors, 
optimization, parallelism, superscalar processors, vectorization 
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